| Speech Recognition | \(\longrightarrow\) | Get your facts first, then you can distort them as you please. | |
| Music generation | \(\emptyset\) | \(\longrightarrow\) | |
| Sentiment classification | Great movie ? Are you kidding me ! Not worth the money. | \(\longrightarrow\) | |
| DNA sequence analysis | ACGGGGCCTACTGTCAACTG | \(\longrightarrow\) | AC GGGGCCTACTG TCAACTG |
| Machine translation | 网红脸 | \(\longrightarrow\) | Internet celebrity face |
| Video activity recognition | \(\longrightarrow\) | Running | |
| Name entity recognition | Use Netlify and Hugo. | \(\longrightarrow\) | Use Netlify and Hugo. |
x: Use(\(x^{<1>}\)) Netlify(\(x^{<2>}\)) and(\(x^{<3>}\)) Hugo(\(x^{<4>}\)) .(\(x^{<5>}\))
y: 0 (\(y^{<1>}\)) 1(\(y^{<2>}\)) 0(\(y^{<3>}\)) 1(\(y^{<4>}\)) 0(\(y^{<5>}\))
\(x^{(i)<t>}\), \(T_x^{(i)}\) (\(i^{th}\) sample)
\(y^{(i)<t>}\), \(T_y^{(i)}\) (\(i^{th}\) sample)
\(\left[\begin{array}{c} a[1]\\ aaron[2]\\ \vdots\\ and[360]\\ \vdots\\ Hugo[4075]\\ \vdots\\ Netlify[5210]\\ \vdots\\ use[8320]\\ \vdots\\ Zulu[10000] \end{array}\right]\Longrightarrow use=\left[\begin{array}{c} 0\\ 0\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 1\\ \vdots\\ 0 \end{array}\right], Netlify=\left[\begin{array}{c} 0\\ 0\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 1\\ \vdots\\ 0\\ \vdots\\ 0 \end{array}\right], and=\left[\begin{array}{c} 0\\ 0\\ \vdots\\ 1\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 0 \end{array}\right], Hugo=\left[\begin{array}{c} 0\\ 0\\ \vdots\\ 0\\ \vdots\\ 1\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 0 \end{array}\right]\)
x: Use(\(x^{<1>}\)) Netlify(\(x^{<2>}\)) and(\(x^{<3>}\)) Hugo(\(x^{<4>}\)) .(\(x^{<5>}\))
y: 0 (\(y^{<1>}\)) 1(\(y^{<2>}\)) 0(\(y^{<3>}\)) 1(\(y^{<4>}\)) 0(\(y^{<5>}\))
\(x^{(i)<t>}\), \(T_x^{(i)}\) (\(i^{th}\) sample)
\(y^{(i)<t>}\), \(T_y^{(i)}\) (\(i^{th}\) sample)
\(a^{<0>}= \mathbf{o}\); \(a^{<1>} = g(W_{aa}a^{<0>} + W_{ax}x^{<1>} + b_a)\)
\(\hat{y}^{<1>} = g'(W_{ya}a^{<1>} + b_y)\)
\(a^{<t>} = g(W_{aa}a^{<t-1>} + W_{ax}x^{<t>} + b_a)\)
\(\hat{y}^{<t>} = g'(W_{ya}a^{<t>} + b_y)\)
\(L^{<t>}(\hat{y}^{<t>}) = -y^{<t>}log(\hat{y}^{<t>}) - (1-y^{<t>})log(1-\hat{y}^{<t>})\)
\(L(\hat{y}, y) = \Sigma_{t=1}^{T_y}L^{<t>} (\hat{y}^{<t>}, y^{<t>})\)
\(a^{<t>}=g(W_a[a^{<t-1>}, x^{<t>}] +b_a)\)
\(a^{<t>}=g(W_a[a^{<t-1>}, x^{<t>}] +b_a) \longleftarrow tanh\)
\(c^{<t>}=a^{<t>}\)
\(\tilde{c}^{<t>}=tanh(W_{c}[c^{<t-1>},x^{<t>}]+b_{c})\)
\(\Gamma_{u}=\sigma(W_{u}[c^{<t-1>},x^{<t>}]+b_{u})\)
\(c^{<t>} = \Gamma_u * \tilde{c}^{<t>}+(1-\Gamma_u)*c^{<t-1>}\)
\(c^{<t>}=a^{<t>}\)
\(\tilde{c}^{<t>}=tanh(W_{c}[\ \ \ \ \ \ \ c^{<t-1>},x^{<t>}]+b_{c})\)
\(\Gamma_{u}=\sigma(W_{u}[c^{<t-1>},x^{<t>}]+b_{u})\)
\(c^{<t>} = \Gamma_u * \tilde{c}^{<t>}+(1-\Gamma_u)*c^{<t-1>}\)
\(c^{<t>}=a^{<t>}\)
\(\tilde{c}^{<t>}=tanh(W_{c}[\Gamma_r* c^{<t-1>},x^{<t>}]+b_{c})\)
\(\Gamma_{u}=\sigma(W_{u}[c^{<t-1>},x^{<t>}]+b_{u})\)
\(c^{<t>} = \Gamma_u * \tilde{c}^{<t>}+(1-\Gamma_u)*c^{<t-1>}\)
\(c^{<t>}=a^{<t>}\)
\(\tilde{c}^{<t>}=tanh(W_{c}[\Gamma_r* c^{<t-1>},x^{<t>}]+b_{c})\)
\(\Gamma_{u}=\sigma(W_{u}[c^{<t-1>},x^{<t>}]+b_{u})\)
\(\Gamma_{r}=\sigma(W_{r}[c^{<t-1>},x^{<t>}]+b_{r})\)
\(c^{<t>} = \Gamma_u * \tilde{c}^{<t>}+(1-\Gamma_u)*c^{<t-1>}\)
\[ \begin{array}{cc} GRU & LSTM\\ \tilde{c}^{<t>}=tanh(W_{c}[\Gamma_{r}*c^{<t-1>},x^{<t>}]+b_{c})\ \ \ \ & \tilde{c}^{<t>}=tanh(W_{c}[a^{<t-1>},x^{<t>}]+b_{c})\\ \Gamma_{u}=\sigma(W_{u}[c^{<t-1>},x^{<t>}]+b_{u})\ \ \ \ & \Gamma_{u}=\sigma(W_{u}[a^{<t-1>},x^{<t>}]+b_{u})\\ \Gamma_{r}=\sigma(W_{r}[c^{<t-1>},x^{<t>}]+b_{r})\ \ \ \ & \Gamma_{f}=\sigma(W_{f}[a^{<t-1>},x^{<t>}]+b_{f})\\ & \Gamma_{0}=\sigma(W_{0}[a^{<t-1>},x^{<t>}]+b_{0})\\ c^{<t>}=\Gamma_{u}*\tilde{c}^{<t>}+(1-\Gamma_{u})*c^{<t-1>\ \ \ \ } & c^{<t>}=\Gamma_{u}*\tilde{c}^{<t>}+\Gamma_{f}*c^{<t-1>}\\ a^{<t>}=c^{<t>}\ \ \ \ & a^{<t>}=\Gamma_{0}*c^{<t>} \end{array} \]
\(e_{man} - e_{woman} \approx e_{king} - e_{?}\)
\(\rightarrow \underset{w}{argmax} \{sim (e_{w}, e_{king} - e_{man} + e_{woman})\}\)
\(sim(e_w, e_{king}-e_{man}+e_{woman})\) = ?
Cosine similarity: \(sim(a,b) = \frac{a^{T}b}{ ||a||_{2} ||b||_{2}}\)
\(sim(e_w, e_{king}-e_{man}+e_{woman})\) = ?
Cosine similarity: \(sim(a,b) = \frac{a^{T}b}{ ||a||_{2} ||b||_{2}}\)
Mikolov et. al., 2013, Linguistic regularities in continuous space word representations↩